Feature Subset Selection and Order Identification for Unsupervised Learning
نویسندگان
چکیده
This paper explores the problem of feature subset selection for unsupervised learning within the wrapper framework. In particular, we examine feature subset selection wrapped around expectation-maximization (EM) clustering with order identiication (identifying the number of clusters in the data). We investigate two diierent performance criteria for evaluating candidate feature subsets: scatter separability and maximum likelihood. When the \true" number of clusters k is unknown , our experiments on simulated Gaus-sian data and real data sets show that incorporating the search for k within the feature selection procedure obtains better \class" accuracy than xing k to be the number of classes. There are two reasons: 1) the \true" number of Gaussian components is not necessarily equal to the number of classes and 2) clustering with diierent feature subsets can result in diierent numbers of \true" clusters. Our empirical evaluation shows that feature selection reduces the number of features and improves clustering performance with respect to the chosen performance criteria.
منابع مشابه
Online Streaming Feature Selection Using Geometric Series of the Adjacency Matrix of Features
Feature Selection (FS) is an important pre-processing step in machine learning and data mining. All the traditional feature selection methods assume that the entire feature space is available from the beginning. However, online streaming features (OSF) are an integral part of many real-world applications. In OSF, the number of training examples is fixed while the number of features grows with t...
متن کاملLearning Word Sense With Feature Selection and Order Identification Capabilities
This paper presents an unsupervised word sense learning algorithm, which induces senses of target word by grouping its occurrences into a “natural” number of clusters based on the similarity of their contexts. For removing noisy words in feature set, feature selection is conducted by optimizing a cluster validation criterion subject to some constraint in an unsupervised manner. Gaussian mixture...
متن کاملA Parallel Genetic Algorithm Based Method for Feature Subset Selection in Intrusion Detection Systems
Intrusion detection systems are designed to provide security in computer networks, so that if the attacker crosses other security devices, they can detect and prevent the attack process. One of the most essential challenges in designing these systems is the so called curse of dimensionality. Therefore, in order to obtain satisfactory performance in these systems we have to take advantage of app...
متن کاملA Parallel Genetic Algorithm Based Method for Feature Subset Selection in Intrusion Detection Systems
Intrusion detection systems are designed to provide security in computer networks, so that if the attacker crosses other security devices, they can detect and prevent the attack process. One of the most essential challenges in designing these systems is the so called curse of dimensionality. Therefore, in order to obtain satisfactory performance in these systems we have to take advantage of app...
متن کاملUnsupervised feature selection using clustering ensembles and population based incremental learning algorithm
This paper describes a novel feature selection algorithm for unsupervised clustering, that combines the clustering ensembles method and the population based incremental learning algorithm. The main idea of the proposed unsupervised feature selection algorithm is to search for a subset of all features such that the clustering algorithm trained on this feature subset can achieve the most similar ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2000